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ABSTRACT 



The Shear TEsting Programme, STEP, is a collaborative project to improve the accuracy 
and reliability of all weak lensing measurements in preparation for the next generation of 
wide-field surveys. In this first STEP paper we present the results of a blind analysis of simu- 
lated ground-based observations of relatively simple galaxy morphologies. The most success- 
ful methods are shown to achieve percent level accuracy. From the cosmic shear pipelines that 
have been used to constrain cosmology, we find weak lensing shear measured to an accuracy 
that is within the statistical errors of current weak lensing analyses, with shear measurements 
accurate to better than 7%. The dominant source of measurement error is shown to arise from 
calibration uncertainties where the measured shear is over or under-estimated by a constant 
multiplicative factor. This is of concern as calibration errors cannot be detected through stan- 
dard diagnostic tests. The measured calibration errors appear to result from stellar contami- 
nation, false object detection, the shear measurement method itself, selection bias and/or the 
use of biased weights. Additive systematics (false detections of shear) resulting from residual 
point-spread function anisotropy are, in most cases, reduced to below an equivalent shear of 
0.001, an order of magnitude below cosmic shear distortions on the scales probed by current 
surveys. 

Our results provide a snapshot view of the accuracy of current ground-based weak lensing 
methods and a benchmark upon which we can improve. To this end we provide descriptions of 
each meth od tested and i nclude details of the eight different implementations of the commonly 
used lKaiser et al.l fT995) method (KSB+) to aid the improvement of future KSB+ analyses. 
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1 INTRODUCTION 

Gravitational lensing provides an unbiased way to study the dis- 
tribution of matter in the Universe. Derived from the physics of 
gravity, where gravitational light deflection is dependent solely 
on the distribution of matter, weak gravitational lens theory de- 
scribes a unique way to directly probe dark matter on large scales 
(see the extensive review bv lBartelmann & Schneide 3l200lh . This 
tool has many astronomical applications; the detection of weak 
shear around galaxy cluste rs yields an estimate of the total clus- 
ter m ass (see for example Iwittman et alj|2003l iMargoniner et alJ 
120051) and enable s a full mass recons t ruction of low redshift clusters 
(see f or example Iciowe et aljEooH: iGrav etai]l2002t Ifaahle et alj 
2002); the average weak tangential shear of distant galaxies around 
nearby galaxies constrains the ensemble average prope rties of dark 
matter halos (see for example Hoekstra et al. 2004; Sheldon et al. 



2004); the weak lensing of background galaxies by foreground 
large-scale structure directly probes the evolution of the non-linear 
matter power spectrum, hence providing a signal that can constrain 
cosmological parameters (see review by Van Waerbe ke & Mellieil 
2003). This last application has the great promise of being able 
to tightly constrain the properties of dark e nergy with the next 
generation of wide-field multi-colour surveys ( Jain & Tavlor 2003 ; 
Bernstein & Jain 2004; Benabed & Van Waerbeke 2004; Heavens 
l2003HRefregier et alj2004i) ~ 

Technically, weak lensing is rather challenging to detect. It re- 
quires the measurement of the weak distortion that lensing induces 
in the shapes of observed galaxy images. These images have been 
convolved with the point spread function (PSF) distortion of the 
atmosphere, telescope and camera. The accuracy of any analysis 
therefore depends critically on the correction for instrumental dis- 
tortions and atmospheric seeing. Weak lensing by large-scale struc- 
ture induces percent level correlations in the observed ellipticities 
of galaxies, termed 'cosmic shear'. This cosmological application 
of weak lensing theory is therefore the most demanding technically, 
owing to the fact that for any weak lensing survey, the instrumen- 
tal distortions are an order of magnitude larger than the underlying 
cosmic shear distortion that we wish to detect. We therefore focus 
on the demands of this particular application even though our find- 
ings will be beneficial to all weak lensing studies. 

The unique qualities of weak lensing as a dark mat- 
ter and dark energy probe demand that all technical chal- 
lenges are met and overcome, and this desire has lead to 
the development of some of the most innovative methods 
in astronom y. The first p ioneer i ng weak lensing measure ment 
methods by iTvson et alj Jl99Cl) . IBonnet & Mellieil < 19951) and 
Kaiser et^] ^995 j) (KSB ) have improved~ ^Lupp i^^^KaiseJ 



19971 : IHoekstra et al|ll998l) (KSB+) an d diversifie d 1 Rhod es et alj 
20001 lKaiseill2000[: bridle et alj|200lt iBernstein & Jarvisl 120021 
Refregier & Bacon 2003; Massev & Refregier 2004). Novel meth- 
ods to model the spatial and temporal variation of the PSF have 
also been designed to improv e the success of the PSF correction 
IHoekstra 2004; Jarvis & Jain 2004). In addition, diagnostic tech- 
niques have been developed and implemented to provide indica- 
tors for the presence of residual systematic non-lensing distortions 
jBacon et alJl2003t ICrittenden etaT]|2002l : ISchneider etal] l2002t 
IBrown et all2003l) 

Rapid technical development has mirrored the growth in ob- 
servational efforts with the cosmic shear analysis of several wide- 
field optical surveys yielding joint constraints on the matter den- 
sity parame ter fl m and t he amplitude of the matter power spectrum 
as iMaoli et alj 1200 ll : IRhodes et all 1200 lk I Van Waerbeke et alJ 
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l2005t ISembolini et aljfcoOSl) . The results from these efforts are 
found to be in broad agreement and are fast becoming more cred- 
ible with the most recent publications presenting the results from 
several different diagnostic tests to determine the levels of sys- 
tematic error. Table Q lists the most recent cosmic shear results 
from different authors or surveys, the two-point statistics used 
in the cosmological parameter analysis and the statistics used to 
determine le vels of systematic erro rs through an E/ B mode de- 
composition ICrittenden et alJl2002l) . See ISchneider et alj f2002) 
and Brown et al J 120031) for details about each two- point statis- 
tic and their E/B mode de co mposition and [iVIassey et alj 120051) . 
IVan Waerbeke et aljjiool) and lHevmans et alJl2005l) for different 
discussions on which statistics are best to use. For such a young 
field of observational research, the ~ 2a agreement between the 
results, shown in TableQ is rather impressive. The differences be- 
tween the results are, however, often cited as a reason for caution 
over the use of cosmic shear as a cosmological probe. For this rea- 
son the Shear TEsting Programme 1 (STEP) was launched in order 
to improve the accuracy and reliability of all future weak lensing 
measurements through the rigorous testing of shear measurement 
pipelines, the exchange of data and the sharing of technical and 
theoretical knowledge within the weak lensing community. 

The current differences seen in cosmic shear cosmological pa- 
rameter estimates could result from a number of sources; inaccurate 
source redshift distributions that are required to interpret the cosmic 
shear signal; sampling variance; systematic errors from residual in- 
strumental distortions; calibration biases in the shear measurement 
method. Contamination to cosmic shear analyses from the intrin- 
sic galaxy alignment of nearby galaxies is currentl y thought to be 



a weak effect that is measured and mitigated in Hevmans et al 
j2004j ) (also see IKing & SchneideJ 120031 : IHev mans & Heavens 
l2003t IKing & Schneideill200l and references therein). With the 
next generation of wide-field multi-colour surveys many of these 
problems can swiftly be resolved as the multi-colour photometric 
redshifts will pr ovide a good estim ate of the redshift distribution 
(see for example brown eta l. 2003) and the wide areas will min- 
imise sampling variance. In addition, all new instrumentation has 
been optimised to reduce the severity of instrumental distortions 
improving the accuracy of future PSF corrections. Implementing 
diagnostic statistics that decompose cosmic shear signals into their 
lensing E-modes and non-lensing B-modes ICrittenden et alfcOOl 
ISchneider et alJl200llBrown et all2003ft immediately alerts us to 
the presence of systematic error within our data set. B-mode sys- 
tematics can then be reduced through the modification of PSF mod- 
els IVan Waerbeke et aljEooH Ijarvis & Jainll2004 or merely the 
selection of angular scales above or below which the systematics 
are removed. Calibration bias is therefore perhaps of greatest con- 
cern as, in contrast to additive PSF errors, it can only be directly 
detected through the cosmic shear analysis of image simulations, 
althou gh s ee the discu s sion o n se lf-calibration in IHuterer et alJ 
(2005) and lHirata etal] 120041) and IMandelbaum et alj 120051) for 
model-dependent estimates of shear calibration errors in the Sloan 
Digital Sky Survey. With the statistics currently used to place 
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constraints on cosmological parameters, a shear calibration error 
contributes directly to an error in erg. The recent development of 
statisti cs which are fairly insensitiv e to shear calibration errors 
I Jain & Tavlor 2003; Bernstein 2006) are certainly one solution to 
this potential problem. Also see Ishak 1 2005), where shear calibra- 
tion uncertainties are marginalised over in the cosmological param- 
eter estimation. 

iBacon et alJ 1200 ll) . lErben et alJ 1200 ll) and iHoekstra et ail 
(2002) presented the first detailed cosmic she ar analyses of artifi- 
cial image simulations using the KSB+ method. lBacon et all 1200 ll) 
found that the KSB+ method was reliable to ~ 5% provided a cal- 
ibration factor of 0.85 was included in the analysis to increase the 
KSB+ shear estimato r. The calibration f actor has since been in- 
cluded in the work o f lBacon et alJ <2003l) . iBrown et all <2003l) and 
Massev et al who implement the KSB+ pipeline tested in 

B^conet^!Tl200liriErben et alJ 1200 it) found that depending on 
the PSF type tested and the chosen implementation of the KSB+ 
formula, described in section |2~71 the KSB+ method was reliable 
to ±10 — 15% and did no t require a calibratio n correction. The 
artificial images tested by Hoe kstra et alJ 120021) included cosmic 
shear derived from ray-tracing simulations. They found that the in- 
put lensing signal could be recovered to better than 10% of the 
input value. The difference between these three conclusions is im- 
portant. All papers adopted the same KSB+ method, but subtle dif- 
ferences in their implementation resulted in the need for a calibra- 
tion correction in one case but not in the others. It is therefore not 
sufficient to cite these papers to support the KSB+ method as ev- 
ery individuals' KSB+ pipeline implementation may differ slightly, 
introducing a discrepancy between the results. 

For the cosmic shear, galaxy-galaxy lensing and cluster mass 
determinations published to date, ^ 10% errors are at worst com- 
parable to the statistical errors and are not dominant. Much larger 
surveys now underway will, however, reduce statistical errors on 
various shear measurements to the ~ 2% level, requiring shear 
measurement accurate to ~ 1%. In the next decade, deep weak- 
lensing surveys of thousands of square degrees will require shear 
measurements accurate to ~ 0.1%. The technical challenges as- 
sociated with measuring weak lensing shear must therefore be ad- 
dressed and solved in a relatively short period of time. 

Whilst KSB+ is currently the most widely used weak lens- 
ing method, p romisi ng altern ative m ethods have be en deve loped 
I Rhod es et all l2000l (RRG ); iKaised l2000l (K2K): tSmithl I 200C 
(ellipto Y. iBridle et alJl200ll (Im2s hape); iBernstein & Jarvisl 2002 
(BJ02) ; lRefregieJ2003l (shapelets) ; iMassev & RefregieJ2004l (po- 
lar shapelets )! and im plemented in cos mic shear analyses [see 
for ex ample iRhodes et aljEooH (RRG); IWittman et alj|200ll (el- 
lipto) ; Uarvis et all2003l and ljarvis et all2005l (BJ02): Cha ng et alJ 
2004 (shapelets)l, and cluster lensin g studies [see fo r exam- 
ple iBardeau et all 120041 (Im2shape); bahle et al J 120021 (K2K); 
Margonine r et all2005l (elliptoY. Thorough testing of these newer 
tec hniques is however some what lacki ng in the l iterature, although 
see lRefregier & Baconl 120031) and lMassev et alJ 12004 for tests of 
the shapelets method. 

In this paper we present the first of the STEP initiatives; the 
blind 2 analysis of sheared image simulations with a variety of weak 
lensing measurement pipelines used by each author in their previ- 
ously published work. Authors and methods are listed in Table [2] 
Modifications to pipelines used in published work have not been al- 
lowed in light of the results and we thus present our results openly 
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Author Key Method 



Bridle & Hudelot 


SB 


Im2shaDe ( Bridle et al. 2001) 


Brown 


MB 


KSB+ TBacon et al. (2000) Dirielinel 


Clowe 


CI &C2 


KSB+ 


Dahle 


HD 


K2K (Kaiser 2000) 


Hetterscheidt 


MH 


KSB+ lErben et al. (2001) MDelinel 


Heymans 


CH 


KSB+ 


Hoekstra 


HH 


KSB+ 




MJ 


Rf^ni ctf 1 ! n sir T f a/1 c ( / flMV ^ 

OClllSLClll OC JtllVlh \iL\J\JjLf 

Rounding kernel method 


Kuijken 


KK 


Shapelets to 12 th order 
Kuiiken (2006) 


Margoniner 


VM 


Wittmanet al. (2001) 


Nakajima 


RN 


Bernstein & Jarvis (2002) 
Deconvolution fitting method 


Schrabback 


TS 


KSB+ 

[Erben et al. (2001) + modifications] 


Van Waerbeke 


LV 


KSB+ 



Table 2. Table of authors and methods. The key identifies the authors in all 
future plots and Tables. 

to provide the reader with a snapshot view of how accurately we 
can currently measure weak lensing shear from galaxies with rela- 
tively simple morphologies. This paper will thus provide a bench- 
mark upon which we can improve in future STEP initiatives. Note 
that some of the methods evaluated in this paper are experimen- 
tal anrj/or inearly stages of development, notably the methods 
of iKuiikenl 120061) . the deconvolution fitting method of Nakajima 
(2005 in preparation), and the Dahle implementation of K2K. The 
results from these particular methods should therefore not be taken 
as a judgment on their ultimate potential. 

This paper is organised as follows. In Section |2| we review 
the different shear measurement methods used by each author and 
describe the simulated data set in Section|3| We compare each au- 
thors' measured shear with the input simulation shear in Section|4] 
investigating forms of calibration bias, selection bias and weight 
bias. Note that our discussion on the issue of source selection bias 
is indeed relevant for many different types of survey analysis, not 
only the lensing applications detailed here. We discuss our findings 
in Section|5|and conclude in Section|6| 



2 METHODS 

In the weak lensing limit the ellipticity of a galaxy is an un- 
biased estimate of the gravitational shear. For a perfect ellipse 
with axial ratio (3 at position angle 6, measured counter-clockwise 
from the x axis, we can define the following ellipticity parameters 
iBonnet & MellieJ 19951) 

f ei \_ l-P( cos20 \ 

\ e 2 J l + /3\ sin26» J ' ( ' 

and the complex ellipticity e — e 1 + ie 2 . In the case of weak shear 
I7I <C 1, the shear 7 = 71 + 172 is directly related to the average 
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Survey Analysis 


Pipeline Description 


<T8 


Statistic 


E/B decomposition 


Area (deg 2 ) 




Hoekstra et al. (2002) 


Hoekstra et al. (1998) 


0.86±°f 3 


(Map 2 ) 


(M ap 2 > (Ml) 


53.0 


0.54- 0.66 


Refreeier et al. (2002) 


Rhodes et al. (2000") 


0.94 ±0.24 


<7 2 > 


(Map 2 ) <Mi> 


0.36 (s) 


0.9 ± 0.1 


Brown et al. (2003) 


Bacon et al. (2000) 


0.72 ±0.09 


e± p kk 


pKK pn{3 pfl@ 


1.25 


0.85 ± 0.05 


Hamana et al. (2003) 


Hamana et al. (2003) 




(Map 2 ) 


(Map 2 ) (Ml) 


2.1 


0.6- 1.4 


Rhodes et al. (2004) 


Rhodes et al. (2000) 


1.02 ±0.16 


<7 2 > 


none 


0.25 (s) 


1.0 ±0.1 


Van Waerbeke et al. (2005) 


Van Waerbeke et al. (2000) 


0.83 ±0.07 


(Map 2 ) ? E 


(Map 2 ) <Mi> C E « B 


8.5 


0.8 - 1.0 


Jarvis et al. (2005) 


Bernstein & larvis (2002) 


72+ - 17 


<7 2 > <Map 2 > 


(M ap 2 > <Mi> 


75.0 


0.6 ±0.1 


Massev et al. (2005) 


Bacon et al. (2000) 


1.02 ±0.15 






4.5 


0.8 ±0.08 


Hevmans et al. (2005) 


Hevmans et al. (2005) 


0.68 ±0.13 


f±, P KK 




0.22 (s) 


1.0 ±0.1 



Table 1. The most recent cosmological parameter constraints on the amplitude of the matter power spectr um erg from each author or survey, for a matter 
density parameter H m = 0.3. Quoted errors on as are la (68% confidence) except in the case of Ijarvis et a?] l2005t) where the errors given are 2a (95% 
confidence). Several different statistics have been used to constrain as, as detailed, where (M ap 2 ) is the mass aperture statistic, (-y 2 ) is the top-hat shear 
variance, £± are the shear correlation functions and P KK is the shear power spectrum. The statistics used to determine the level of non-lensing B-modes in 
each result are also listed where (M^ 2 ) is the B-mode ma ss aperture sta t istic, g E a nd £ B are E and B mode correlators, is the B-mode shear power 
spectrum, and P K @ is the E/B cross power spectrum. See Schneider et al. 1 2002) and Brown et al. 1 2003) for details about each two-point statistic and their 
E/B mode decomposition. The shear measurement pipeline that has been used for each result is listed for reference, along with the area of the survey and the 
median redshift estimate of the survey z m . Space-based surveys are denoted with an (s) in the area column. 



galaxy ellipticity, 7 ~ (e). In this section we briefly review the dif- 
ferent measurement methods used in this STEP analysis to estimate 
galaxy ellipticity in the presence of instrumental and atmospheric 
distortion and hence obtain an estimate of the gravitational shear 7. 
Common to all methods is the initial source detection stage, typi- 
cally performed using the Sisx/ractorlBertin & Arnouts 1996) soft- 
ware. The peak finding tool hfindpeaks from the imcat 3, software is 
used as an alternative in some KSB+ methods, listed in Appendix 
Table lAll In order to characterise the PSF, stars are selected in all 
cases from a magnitude- size plot. 

2.1 KSB+ Method 

IKaiser et all <1995l) . lLuDpino & KaiseJ <1997t) and iHoekstra et alJ 
Jl998t) (KSB-l-) prescribe a method to invert the effects of the PSF 
smearing and shearing, recovering a shear estimator uncontami- 
nated by the systematic distortion of the PSF. 

Objects are parameterised according to their weighted 
quadrupole moments 

_ J d 2 ew(e)i(0)e l e J 

^ 3 jd?8W {0)1(0) ' {> 

where / is the surface brightness of the object, 8 is the angular dis- 
tance from the object centre and W is a Gaussian weight function 
of scale length r g , where r g is some measurement of galaxy size. 
For a perfect ellipse, the weighted quadrupole moments are related 
to the weighted ellipticity parameters 4 e a by 

( ei \_ 1 ( Q11-Q22 \ 

\e 2 ) Q11 + Q22 V 2 Qi2 J ' ( ' 

3 www.ifa.hawaii.edu/r^kaiser/imcat/ 

4 The KSB+ definition of galaxy ellipticity differs from eauationHI If the 
weight function W(0) = 1 in equation |2| the KS B+ ellipticity |e| = 
(1 — 3 2 )/(l ±/3 2 ), where /3 is the axial ratio (see Bartelman n & SchneideJ 

120011) . 



IKaiser etalJll995l) show that if the PSF distortion can be described 
as a small but highly anisotropic distortion convolved with a large 
circularly symmetric seeing disk, then the ellipticity of a PSF cor- 
rected galaxy is given by 

cor obs nsm / a\ 

e Q = e Q - P a f3Pp, (4) 

where p is a vector that measures the PSF anisotro py, and P Bm 
is the smear polarisability tensor given in IHoekstra et alJ il998t) . 
p(0) can be estimated from images of stellar objects at position 
by noting that a star, denoted throughout this paper with * , imaged 
in the absence of PSF distortions has zero ellipticity: E* cor = 0. 
Hence, 

/ nsm*\- 1 *obs *c\ 

= ( p ) MQ £ a ■ ( 5 ) 

The isotropic effect of the atmosphere and weight function can be 
accounted for by applying the pre-seeing shear polarisability tensor 
correction P~° ', as proposed by Luppi no & Kaiserjjl997l) . such that 

cor s , t->7 ,r\ 

£ a = £ a + Papld 1 (") 

where e s is the int rinsic source ellipt i city an d 7 is the pre-seeing 
gravitational shear. iLuppino & Kaiser] jl~997f) show that 

p7 _ psh psm/psm*N-l psh* , 7 n 
Pad - ^<*P ~ olh Ipg ^6f3 , ( ') 

where P sh is the shear polarisability tensor given in Hoekstra et alJ 
( 1998) and p sm * and p sh * are the stellar smear and shear polar- 
isability tensors respectively. Combining the PSF correction, equa- 
tion J4j, and the P 7 seeing correction, the final KSB+ shear esti- 
mator j is given by 

i^^T^T-p^p.]- (8) 

This method has been used by many of the authors although differ- 
ent interpretations of the above formula have introduced some sub- 
tle differences between each authors' KSB+ implementation. For 
this reason we provide precise descriptions of each KSB+ pipeline 
in the Appendix. 
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2.2 K2K Method 

One drawback of the KSB+ method is that for non-Gaussian PSF 
distort ions, the KSB PSF correction is mathematically poorly de- 
fined. I Raised {2000) (K2K) addresses this issue by properly ac- 
counting for the effects of a realistic PSF. It also proposes mea- 
suring shapes from images that have been convolved with a re- 
circularising PSF, where the re-circularising PSF is a 90° rota- 
tion of a modeled version of the PSF. Section 2.3.6 of Dahle et al. 
(2002) provides a condensed description of the K2K shear estima- 
tor which has been applied to the STEP simulations by Dahle (HD). 



2.3 Shapelets 

The shapelets formalism of iRefregied 120031) allows galaxy im- 
ages to be decomposed into orthogonal basis functions which trans- 
form simply under a variety of operations, in particular shear and 
(de)convolution. The expansion is based on a circular Gaussian, but 
inclusion of higher orders allows general shapes to be described 

well . 

| Kuiikenl 120061) uses the shapelets formalism of iRefresieJ 
to derive individual shape es timators that differ from 
the method of RefregieL& Bacon (2003). We briefly review this 
metho d wh i ch is b ased on the 'constant ellipticity object' estimator 
of lKuiikerJ <1999l) . referring the reader to lKuiikenl J2OO6) for fur- 
ther details. Each galaxy image is fitted as an intrinsically circular 
source that has been sheared and then smeared by the PSF. These 
operations are efficiently expressed in terms of shapelets as 

G m odel=P-(l+7lS'l+72&)-C (9) 

where G mo dei is the model for the galaxy image, P is the known 
PSF convolution operator (expressed as a matrix operating on 
shapelet coefficients), Si are the first-order shear operators, 7; are 
the shear distortions that are fitted, and C is a general circular 
source of arbitrary radial luminosity profile (expressed as a super- 
position of shapelets). Note that P is determined from stellar ob- 
jects whose shapelet coefficients are interpolated separately across 
the field of view to the position of each observed galaxy. Fitting 
this model to each observed galaxy image yields a best-estimate 
(71, 72) shear distortion value for each galaxy, which can then be 
averaged or correlated to yield shear estimators. In this paper, we 
use 7; = (7i)/(l — (j 2 )) as an estimate for the shear from the 
ensemble population. The factor in the denominator is the response 
of the average ellipticity of a population of elliptical sources to an 
overall shear (BJ02). To cope with possible centroiding errors, an 
arbitrary translation is included in the fit as well. The uncertainties 
on the pixel values of each galaxy image can be propagated into the 
shapelet coefficients, and to the estimates of the 7*. This method is 
exact f or galaxies that are intrinsically circular or elliptical. Kuiiken 
<1999l) shows that this method also works well for galaxies whose 
ellipticity or position angle varies with radius. 



2.4 Im2shape 

Im2shape (Brid le et alJfeOOll) , Bridle et al. 2005, in prep) fits a sum 
of elliptical Gaussians to each object image, taking into account 
unknown background and noise levels. This approach follows that 
sugg ested by Kuiiken] mH). 



SExtractor is used to define postage stamps containing each 
object 5 and galaxies and stars are selected from the size magni- 
tude plot from the SExtractor output. The galaxies are modeled 
by Im2shape using two concentric Gaussians, with 6 free param- 
eters for the first Gaussian, and 2 additional free parameters (size 
and amplitude) for the second Gaussian. The noise is assumed to 
be uncorrelated, Gaussian and at the same level for all pixels in 
the postage stamp. The background level is assumed to be constant 
across the postage stamp. Including the noise and background lev- 
els there are 10 free galaxy parameters in total. Two Gaussians are 
used for the stars in all the images, except for PSF 2, for which the 
amplitude of the second Gaussian was found to be so small that one 
Gaussian was used instead. Where two Gaussians were used to fit 
the stars, the Gaussians were taken to have totally independent pa- 
rameters, with 12 free parameters for the Gaussians, plus the noise 
and background levels, making 14 free parameters in total. To es- 
timate these free parameters fast and efficiently, Im2shape makes 
use of the BayeSys engine (written by Skilling & Gull). This im- 
plements Markov-Chain Monte Carlo sampling (MCMC) which is 
used to obtain samples from the probability distribution of the un- 
known parameters. Estimates of the free parameters are then taken 
from the mean value of the parameter across the MCMC samples, 
and the uncertainties are taken from the standard deviation. With 
this data set the MCMC analysis takes ~ 15 seconds per galaxy 
image on the COSMOS 6 supercomputer. 

To account for the PSF a grid of 5 x 5 points was defined 
on each image, and the PSF at each point was estimated by taking 
the median parameters of the nearest five stars (note that Im2shape 
was run on all the stellar-like objects and cuts were then used to 
remove outliers). For each galaxy, the PSF shape was taken from 
the grid point closest to the galaxy in question. The trial galaxy pa- 
rameters were then combined with the PSF parameters analytically 
to calculate the convolved image shape. The intensity in the centre 
of each pixel is calculated and this is corrected for the integration 
over the pixel using the curvature of the Gaussian at the centre of 
the pixel (for both star and galaxy shape estimation). The final ellip- 
ticity values for each galaxy (equation^ are found from averaging 
over all the MCMC samples. Only galaxies with ellipticity uncer- 
tainties less than 0.25 were included in the final catalogue, as for 
higher ellipticity uncertainties the error estimates are less reliable 
resulting from the probability distribution becoming less Gaussian. 
To obtain an estimate of the shear from these ellipticity estimates 
the ellipticities are weighted by the inverse square of the elliptic- 
ity uncertainties added in quadrature with the intrinsic ellipticity 
dispersion a e of the galaxies, found to be a e — 0.2. 



2.5 Wittman method with ellipto 

This method uses a re-circularising kernel to eliminate PSF 
anisotropy, and 'adaptive' moments (moments weighted by the 
best-fit elliptical Gaussian) to characterise the ellipticity of the 
source galaxies. It is a partial implementation of BJ02, discussed 
in section l2~6l and primarily differs from BJ02 by using a simpler 
re-circularising kernel. 

SExtractor is used for initial object detection. SExtractor 
centroids and moments are then input to the ellipto program 

5 The postage stamps used for this analysis were 16 X 16 pixels centered 
on the SExtractor position. 

6 www.damtp.cam.ac. uk.cosmos, SGI Altix 3700, 1,3 GHz Madison pro- 
cessors 
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ISmith et alliooit ISmithlfc oOO ) which measures the adaptive mo- 
ments, ellipto also re-measures the centroid and outputs an error 
flag when the centroid differs from the SExtractor centroid. This 
typically happens with blended objects or those with nearby neigh- 
bours, whose measured shapes may not be trustworthy in any case. 
Stars are selected with an automatic routine which looks for a 
dense locus at a constant ellipto size. The selection is then visually 
checked. In real data, ~5% of images require manual tweaking of 
the star selection, although this manual stage was not required for 
the STEP simulations. The spatial variation of the adaptive mo- 
ments is then fit with a second-order polynomial for each CCD of 
each exposure. This fit is then used to generate a spatiall y vary- 
ing 3 x 3 pixel re-circularising kernel, following l Fisc her & Tvsonl 
1997). Note that a 3 x 3 kernel may be too small to properly correct 
a well-sampled, highly elliptical PSF; the practical limit appears to 
be ~ 0. 1 ellipticity. In those cases, the re-circularisation step may 
be applied iteratively, mimicking the effect of larger kernels. For 
the STEP simulations, only PSF 3 required a second iteration, but 
three iterations were applied to all PSFs. 

After re-circularisation, the object detection and ellipto mea- 
surements are repeated to generate the final catalogue. Note that 
object detection on the re-circularised image in principle eliminates 
PSF-anisotropy-dependent selection bias. Objects are rejected from 
the final catalogue if: the ellipto error is non-zero; measured (pre- 
dilution-correction) scalar ellipticity > 0.6 (simulations show that, 
with ground-based seeing, most of these are blends of unrelated ob- 
jects); or size < 120% of the PSF size. The adaptive moments are 
then corrected for dilution by an isotropic PSF and a responsivity 
correction using the formulae of BJ02. Weighting is not applied to 
the data. Note that this method has been used for cluster analyses 
but not for any published cosmic shear results. 

2.6 Bernstein and Jarvis Method: BJ02 

The Jarvis (MJ) and Nakajima (RN) methods each extend the el- 
lipto technique by methods detailed in BJ02. Both are based upon 
expansions of the galaxy and PSF shapes into a series of orthogonal 
2D Gaussian-based function s, the Gauss-Laguerre expa nsion, also 
known as 'polar shapelets' in Massev & Refreeieri l2004t) . Both the 
Jarvis (MJ) and Nakajima (RN) methods move beyond the approx- 
imation, inherent in both the ellipto and KSB methods, that the PSF 
asymmetry can be described as a first-order perturbation to a circu- 
lar PSF. The Jarvis (MJ) method applies 'rounding kernel' filters 
from size 3x3 pixels and up to the images in order to null several 
asymmetric Gauss-Laguerre coefficients of the PSF, not just the 
quadrupoles. Note that for PSF ellipticities of order ~ 0.1, a 3 x 3 
pixel kernel is sufficient to round out stars up to approximately 30 
pixels in diameter. The galaxy shapes are next measured by the 
best-fi t elliptical Gaussian; formulae proposed bv lHirata & Seliakl 
( 2003), are used to correct the observed shapes for the circularising 
effect of the PSF. 

The 'deconvolution fitting method' by Nakajima (RN) imple- 
ments nearly the full formalism proposed by BJ02, which is further 
elaborated in Nakajima et al (2005, in prep): the intrinsic shapes 
of galaxies are modeled as Gauss-Laguerre expansions (to 8 th or- 
der). These are then convolved with the PSF and fit d irectly to the 
observed pixel values in a similar fashion to lKuiikenl ll999l) . This 
should fully capture the effect of highly asymmetric PSFs or galax- 
ies, as well as the effects of finite sampling. Note that both methods 
use the weighting scheme described in section 5 of BJ02. 

A difference betw een the BJ02 approaches and the 
iRefreeier & BacorJ (2003 ) shapelets implementation is that the lat- 



ter uses a circular Gaussian basis set, whereas the BJ02 method 
shears the basis functions until they match the ellipticity of the 
galaxy. This in principle eliminates the need to calculate the 'shear 
polarisabilities' that appear in KSB. 



3 STEP SIMULATION DATA 

For this analysis we have created an artificial set of survey im- 
ages using the SkyMaker programme 7 . A detailed description of 
this soft ware and the gala xy catalogue generator, Stuff 6 , can be 
found in lErbenetaljhoOll) and Bertin & Fouque (in prep) and we 
therefore only provide a brief summary here. In short, for a given 
cosmology and survey description, galaxies are distributed in red- 
shift space with a luminosity and morphological-size distribution 
as defined by observational and semi-analytical relations. Galax- 
ies are made of a co-axial de Vaucouleurs-type spheroid bulge and 
a pur e oblate circular exponential thin disk (see Be rtin & Arnoutsl 
1996, for details). The intrinsic flattening q of spheroids is taken 
between 0.3 and 1, and within this range follows a normal dis- 
tribution with (q) = 0.65 and a q = 0.18 ISandaeeet al]|l970l) . 
Note that we assume the same flattening distribution for bul ges and 
ellipt icals, even if there is some controversy about this iBorosorJ 
1981). Inclination angles i are randomly assigned following a flat 
distribution, as expected from uniformly random orientations with 
respect to the line of sigh t. The apparent axis ratio /? is given by 
f3 = \J q 2 sin 2 i + cos 2 i for the spheroid component, and given 
by /3 = cos i for the thin disk. The bulge plus disk galaxy is finally 
assigned a random position angle 8 on the sky and the bulge and 
disk intrinsic ellipticity parameters are then calculated from equa- 
tionQ] 

It has been known for some time that pure oblate cir- 
cular disks, oriented with a flat distribution of inclination an- 
gles, do not provide a good match to t he sta t istics from real 
disk galaxies jBinnev & de Vaucouleursl Il98lt iGrosboll Il985t 
lLambasetalJll992t) : in particular, observations show a striking 
deficiency of galaxies with zero ellipticities. Although surface- 
bri ghtness selection effects are n ot to be ignored (see for exam- 
ple Huizinaa & van Albada l 19921) . there is now general agreement 
that this phenomenon mostly betrays intrinsic ellipticities of disk 
planes. T he origin of t hese intrinsic ellipticities is not completely 
clear (see Binnev & Merrifield 1998), and is thought to originate 
partly fro m non-a xisymmetric spiral structures and/or a tri-axial 
potential iRix & Zaritskvll995l) . The simulations used in this anal- 
ysis ignore these aspects, and the simulated galaxies are therefore 
intrinsically 'rounder' on average than real galaxies. This should 
not impact on the lensing analysis that follows, except in the cases 
where weighting schemes are used that take advantage of the sen- 
sitivity of intrinsically circular galaxies to measure weak lensing 
shear. These schemes will have an apparent signal-to-noise advan- 
tage in the current simulations, which is expected to decrease given 
real data. 

A series of five different shears are applied to the galaxy cata- 
logue by modifying the observed intrinsic source ellipticity to cre- 
ate sheared galaxies where 



JSeitz&SchneideJll997l) and g is the complex reduced shear. 

7 http://terapix.iap.fr/cplt/oldSite/soft/skymaker 

8 ftp://ftp.iap.fr/pub/from_users/bertin/stuff 
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PSF ID PSF type Ellipticity 






no anisotropy 


0.00 


1 


coma 


~ 0.04 


2 


jitter, tracking error 


~ 0.08 


3 


defocus 


~ 0.00 


4 


astigmatism 


~ 0.00 


5 


triangular (trefoil) 


0.00 



Table 3. The SkyMaker simulations are convolved with this series of uni- 
form PSF models. 

For this set of simulations, the convergence k = 0, hence 
the reduced shear g — 7/(1-/4) = 7, where 71 = 
(0.0, 0.005, 0.01, 0.05, 0.1), 72 = 0.0. Sheared bulge and disk ax- 
ial ratios and position angles are then calculated from equation Q 
and the model galaxy images are created. Stars are simulated as- 
suming a constant slope of 0.3 per magnitude interval for the loga- 
rithm of differential stellar number counts down to and I-band mag- 
nitude I — 25. Model galaxy images and stellar point sources are 
then convolved with a series of six different optical PSFs that are 
listed in Table [3] and shown in Figure Q These PSF models were 
chosen to provide a realistic representation of the types of PSF dis- 
tortions that are seen in ground-based observations, through ray- 
tracing models of the optical plane. They also include atmospheric 
turbulence, where the seeing scale is chosen such that when the tur- 
bulence is combined with the PSF anisotropy, all stars have FWHM 
of 0.9 arcsecs. The ellipticity of the PSF from real data is typically 
of the order of 5%, which is similar to the coma model PSF 1. PSF 
2 which features a jitter or tracking error is very elliptical in com- 
parison. The other PSF models test the impact of non-Gaussian PSF 
distortions. A uniform background with surface brightness 19.2 
mag arcsec -2 is added to the image, chosen to match the I-band 
sky background at the Canada-France-Hawaii Telescope site. Pois- 
son photon shot noise and Gaussian read-out noise is then applied. 

The combination of 6 different PSF types and 5 different ap- 
plied shears gives 30 different data sets where each set consists 
of an ensemble of 64 4096 x 4096 pixel images of pixel scale 
0.206 arcsecs. For computational efficiency the data in each set 
stems from the same base catalogue, and as the sky noise levels 
are the same for each data set, many of the parameters required for 
the SExtractor source detection software are the same for each data 
set. Aside from this time-saving measure of setting some of the 
SExtractor source detection parameters only once, prior informa- 
tion about the simulations have not been used in the cosmic shear 
analyses. Each image contains ~ 15 galaxies per square arcminute 
resulting in low level shot noise from the intrinsic ellipticity dis- 
tribution at the 0.1% level for each data set. Stellar object density 
is ~ 10 stars per square arcminute of which roughly 150 per im- 
age were sufficiently bright for the characterisation of the PSF. This 
density of stellar objects is slightly higher than that found with typ- 
ical survey data and was chosen to aid PSF correction. It does how- 
ever increase the likelihood of stellar contamination in the selected 
galaxy catalogue. Although the PSF is uniform across the field of 
view, uniformity has only been assumed in one case (RN). 

The reader should note that the SkyMaker simulations should, 
in principle, provide an easy test of our methods as many shear 
measurement methods are based on the assumption that the galaxy 
shape and PSF are smooth, elliptical and in some cases Gaussian. 




PSF PSF 1 PSF 2 




PSF 3 PSF 4 PSF 5 




PSF PSF 1 PSF 2 




PSF 3 PSF 4 PSF 5 



Figure 1. SkyMaker PSF models, as described in Tablel3l The upper panel 
shows the PSF core distortion, with contours marking 3%, 25% and 90% of 
the peak intensity. The lower panel shows the extended diffraction spikes, 
with contours marking 0.003%, 0.03%, 0.3%, 3% and 25% of the peak 
intensity. 



In reality the shapes of faint galaxies can be quite irregular and, 
particularly in the case of space-based observations, the PSF can 
contain significant structure. In addition, the SkyMaker galaxies 
have reflection symmetry about the centroid which could feasibly 
cause any symmetrical errors to vanish. We should also note that 
some of the authors have previously used SkyMaker simulations 
to test their methods (see lErben et all200ltlHoekstra et alJl2002h . 
These issues will therefore be addressed by two future STEP pub- 
lications with the blind analysis of a more realistic set of artificial 
images that use shapelet information to include complex galaxy 
morphology fMassev et all2004T) . With these shapelet simulations 
we will investigate the shear recovery from ground-based observa- 
tions (Massey et al. in prep) and space-based observations (Rhodes 
et al. in prep). 



4 ANALYSIS 

In this section we compare each authors' measured shear cata- 
logues with the input to each SkyMaker simulation. We match ob- 
jects in each authors' catalogue to the input galaxy and stellar cat- 
alogue, within a tolerance of 1 arcsec. Table |4| lists several gen- 
eral statistics calculated from the PSF model (no anisotropy) 
7 = (0.005, 0.0) set which is a good representation of the STEP 
simulation data. The source extraction method used by each author 
is listed in Table |4| as well as the average number density of se- 
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Author 


Ngals (per arcmin 2 ) 


% stars 


% false 


% stars' 


% false' 


Software 


SNR 


S/N 3 




SB 


18 


1.9 


3.8 


1.5 


3.9 


SExtractor 


6 


7 


6 


MB 


14 


7.1 


0.1 


- 


- 


hfindpeaks 


8 


10 


- 


CI 


12 


2.6 


0.0 


1.1 


0.0 


hfindpeaks & SExtractor 


9 


9 


11 


C2 


12 


2.7 


0.0 


1.2 


0.0 


hfindpeaks & SExtractor 


9 


9 


11 


HD 


17 


44.8 


0.0 


- 


- 


hfindpeaks 


7 


8 


- 


MH 


14 


3.9 


0.0 


2.4 


0.0 


SExtractor 


12 


11 


14 


CH 


12 


2.9 


0.0 


- 


- 


SExtractor 


7 


11 


- 


HH 


16 


10.8 


0.0 


0.1 


3.6 


hfindpeaks 


8 


10 


11 


MJ 


9 


0.0 


3.6 


0.0 


1.0 


SExtractor 


16 


8 


22 


KK 


9 


0.8 


0.0 


0.3 


0.0 


SExtractor 


10 


10 


12 


VM 


13 


3.8 


0.0 






SExtractor 


10 


10 




RN 


9 


0.9 


0.4 


1.5 


0.1 


SExtractor 


19 


10 


24 


TS 


10 


1.4 


0.0 


0.9 


0.0 


SExtractor 


12 


11 


14 


LV 


13 


0.0 


0.0 


0.0 


0.0 


SExtractor 


11 


11 


12 



Table 4. Table to compare the different number density of selected sources per square arcmin, Ngals, and the percentage of stellar contamination (% stars) and 
false detections (% false) in each authors' catalogue. Each catalogue has been created using either the SExtractor and/or the hfindpeaks software. Where authors 
use object weights, the weighted percentage of stellar contamination (% stars') and false detections (% false') are also listed. The final columns give estimates 
of the signal-to-noise of the resulting shear measurement as described in the text. SNR= 7* rue /°7 is the signal-to-noise ratio of the shear measurement. S/N s 
is the signal-to-shot-noise determined from the galaxies selected by each author. Where authors use object weights, the signal-to-weighted-shot-noise S/N' s is 
also determined. 



lected sources per square arcmin, Ngals. To minimise shot noise 
we wish to maximise the number of sources without introducing 
false detections into the sample (note the percentage of false detec- 
tions listed in the '% false' column in Table |4j or contaminating 
the sample with stellar objects (note the percentage of stellar con- 
tamination listed in the '% stars' column in Table |4}- Both false 
objects and stars add noise which can dilute the average shear mea- 
surement. Typically the number of false detections are negligible 
and the stellar contamination is below 5%. The notable exception 
is the Dahle (HD) method that suffers from strong stellar contam- 
ination for all PSF types, a problem that can easily be improved 
upon in future analyses. Where authors use object weights Wi in 
their analysis, the weighted percentage stellar contamination (% 
stars' = [Ei =st ars Wi I Si =a ii Wi] x 100%) and weighted percent- 
age of false object contamination (% false') are also listed. This 
shows, for example, that in the case of Hoekstra (HH), the 10% 
stellar objects are given a very low weight and therefore do not sig- 
nificantly contribute to the weighted average shear measurement. 

Average centroid offsets measured from each authors selected 
catalogues, were found to be < 0.001 pixels for SExtractor based 
catalogues and ~ 0.005 ± 0.001 pixels for hfindpeaks based cata- 
logues. Centroid accur acy is however lik ely to be data dependent, 
and S/N dependent (see Er ben et all200ll) . Thus care should still be 
taken in determining centroids to prevent the problems described 
in I Van Waerbeke et al] 120051) where errors in the SExtractor cen- 
troiding in one field were found to be the source of strong B-modes 
on large scales. Note that starting from version 2.4.3, SExtrac- 
tor provides iterative, Gaussian-weighted centroid measurements 
XWIN.IMAGE and YWIN.IMAGE which have been shown to be 
even more accurate than previous SExtractor centroid measures 
(Bertin & Fouque in prep). 



For each data set we calculate the mean (weighted) shear mea- 
sured by each author, treating each of the 64 images as an inde- 
pendent pointing. We take the measured shear for each data set 7; 
to be the mean of the measurements from the 64 images and as- 
sign an error <r 7 given by the error on the mean. The final three 
columns of Table [4] demonstrate the effect of weights and galaxy 
selection on the signal-to-noise of the measurement. The signal-to- 
noise of the shear measurement is defined as SNR = 7' ruo /<r 7 , 
where 7'™° is the input shear (7* 1 uc = 0.005 for the data analysed 
in Table[4}. The signal-to-shot-noise is defined as S/N s = 7* rue /a 
where a is the error on the mean galaxy ellipticity e (equation 
measured from the 64 images. Note that the shot noise a is calcu- 
lated from the known input ellipticities of galaxies selected by each 
author. The final column applies to authors who use weights, where 
the signal-to-weighted-shot-noise is defined as S/N^ = 7i ruc /°"' 
where a' is the error on the mean weighted galaxy ellipticity. 

Several things can be noted from the signal-to-noise calcula- 
tions. Firstly, the high magnitude, as weak shear has not been mea- 
sured from data with SNR > 10. One must not forget however that 
if weak lensing shear was constant across large areas of sky, shear 
would have been measured with such high signal-to-noise. Sec- 
ondly we find that the signal-to-shot-noise S/N s is not strongly de- 
pendent on the number of galaxies used in the analysis. We find that 
instead the shot noise is more dependent on the galaxies that have 
been selected in the analysis, but note that this statement is unlikely 
to apply to data where the shear varies. Taking Im2shape (SB) and 
BJ02 (MJ) as an example we find ~ 2 times as many galaxies se- 
lected for the Im2shape (SB) analysis as for the B J02 (MJ) analysis, 
but very similar values for the signal-to-shot-noise S/N s . As dis- 
cussed in section|5|the distribution of galaxy ellipticities is strongly 
non-Gaussian with more intrinsically round galaxies than is seen in 
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real data. The galaxy selection of Im2shape (SB) results in a smaller 
proportion of these intrinsically round galaxies being included in 
the analysis increasing the la variation of the selected galaxy ellip- 
ticities. Several of the KSB+ analyses make galaxy selection based 
on galaxy ellipticity, removing the most elliptical galaxies, again 
this reduces the shot noise, independent of the number of galax- 
ies used in the analysis. Lastly, comparing the signal-to-shot-noise 
S/N s and the signal-to-weighted-shot-noise S/N' s we see the ef- 
fectiveness of some of the weighting schemes used in this analy- 
sis. The BJ02 weighting scheme (MJ,RN) puts more weight on the 
intrinsically round galaxies, this effective weighting scheme pro- 
duces the highest signal-to-noise measurements in the STEP anal- 
ysis, although see section l5~6l for the implication of using this ag- 
gressive weighting scheme. 

4.1 Calibration bias and PSF contamination 

In this section we measure the levels of multiplicative calibration 
bias and additive PSF contamination in each authors' shear mea- 
surement. Calibration bias will result from a poor correction for the 
atmospheric seeing that circularises the images. Selection bias and 
weight bias are also forms of calibration bias which we investigate 
further in sections l42l and l43l PSF contamination will result from 
a poor correction for the PSF distortion that coherently smears the 
image. 

We calculate the mean shear ji for each data set as described 
above. For each author and PSF type we then determine, from the 
range of sheared images, the best-fit parameters to 

true / true\2 . true . /, «\ 

7i ~7i = 9(71 ) + ™7i +ci , (11) 

where 7i ruc is the external shear applied to each image. Figure [2] 
shows fits to two example analyses of PSF 3 simulations using 
KSB+ (HH implementation) and BJ02 (MJ implementation). In the 
absence of calibration bias we would expect m = 0. We would also 
expect ci = in the absence of PSF systematics and shot noise, 
and q — for a linear response of the method to shear. In the case 
where the fitted parameter q is consistent with zero, we re-fit with 
a linear relationship, as demonstrated by the KSB+ example in fig- 
urey| 



For all simulations the external applied shear 72 6 = and 
we therefore also measure for each PSF type C2 = (72), averaged 
over the range of sheared images. In the absence of PSF systematics 
and shot noise, we would expect to find C2 = 0. From this analysis 
we found the values of m and q to be fairly stable to changes in PSF 
type and we therefore define a measure of calibration bias to be (m) 
and a measure of non-linearity to be (q) where the average is taken 
over the 6 different PSF sets. We find the value of (cj) averaged 
over the 6 different PSF sets to be consistent with shot noise at the 
0.1% level for all authors, with the highest residuals seen with PSF 
model 1 (coma) and PSF model 2 (jitter). We therefore define a c as 
a measure of our ability to correct for all types of PSF distortions, 
where a\ is the variance of ci and C2 as measured from the 6 dif- 
ferent PSF models. As the underlying galaxy distributions are the 
same for each PSF this measure removes most of the contribution 
from shot noise, although the galaxy selection criteria will result in 
slightly different noise properties in the different PSF data sets. a c 
therefore provides a good estimate of the level of PSF residuals in 
the whole STEP analysis. A more complicated set of PSF distor- 
tions will be analysed in Massey et el (in prep) to address the issue 
of PSF-dependent bias more rigorously. 



KSB+ analysis by HH 




BJ02 analysis by MJ 




7i true 

Figure 2. Examples of two analyses of PSF 3 simulations using KSB+ (HH 
implementation, upper panel) and BJ02 (MJ implementation, lower panel) 
comparing the measured shear 71 and input shear 7j ruc . The best-fit to 
eauation ll H is shown dashed, and the optimal result (where 71 = 7j ruc ) 
is shown dot-dashed. Both analyses have additive errors that are consistent 
with shot noise (fitted y-off set parameter c) and low 1 % calibration errors 
(fitted slope parameter m). The weighting scheme used in the BJ02 analysis 
introduces a non-linear response to increasing input shear (fitted quadratic 
parameter q), reducing the shear recovery accuracy for increasing shear. The 
accuracy of the KSB+ analysis responds linearly to increasing input shear 
and so these results were re-fit with a linear relationship, i.e. q = 0. 



Figure [5] shows the measures of PSF residuals a c and cali- 
bration bias (m) for each author, where the author key is listed 
in Table [2] For the non-linear cases where q 7^ 0, denoted with a 
circle, the best-fit (q) parameter is shown with respect to the right- 
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Figure 3. Measures of calibration bias (m), PSF residuals <r c and non- 
linearity (q) for each author (key in Table l2l. as described in the text. For 
the non-linear cases where (q) 7^ (points enclosed within a large circle), 
(q) is shown with respect to the right-hand scale. In short, the lower the 
value of cr c , the more successful the PSF correction is at removing all types 
of PSF distortion. The lower the absolute value of (m) , the lower the level 
of calibration bias. The higher the q value the poorer the response of the 
method to stronger shear. Note that for weak shear 7 < 0.01, the impact 
of this quadratic term is negligible. Results in the shaded region suffer from 
less than 7% calibration bias. These results are tabulated in Tablel5l 



hand scale. Results in the shaded region suffer from less than 7% 
calibration bias. All methods which have been used in a cosmolog- 
ical parameter cosmic shear analysis lie within this region. With 
regard to PSF contamination, these results show that PSF residuals 
are better than 1% in all cases and are typically better than 0.1%. 
Note that for clarity the results plotted in Figure[3]are also tabulated 
in Table|5l 

In the weak 7 ^ 0.01 regime, the most successful method is 
found to be the BJ02 technique (MJ,RN) producing percent level 
accuracy. For stronger shear distortions, however, this methodol- 
ogy breaks down which can be seen from the high (q) value. This 
method is therefore unsuitable for low redshift cluster mass re- 
constructions where shear distortions of ~ 10% are not uncom- 
mon, although see the discussion in section IB~6l for a solution to 
this issue of non-linearity. Over the full range of shear distortions 
tested, < 7 < 0.1, the most successful method is fou nd to be 
the Hoekstra implementation of the Kais er et alj 1 1995ft method 
(KSB+), producing results accurate to better than 2%. All KSB+ 
pipelines are accurate to better than ~ 15% but the wide range of 
accuracy in these results that are based on the same methodology 
is somewhat disconcerting. It is believed that this spread results 
from the subtly different interpretation and implementation of the 
KSB+ method which we detail in the Appendix. The results from 
the Dahle implementation of K2K (HD) are non-linear, suffering 
from calibration bias at ~ 20% level for weak shear 7 < 0.01. The 
Wittman/Margoniner method (VM) (see section 12751 fares as well 
as the Hetterscheidt (MH) and Schrabback (TS) implementation of 
KSB+ with an accuracy of ~ 15%. Im2shape <Bridle et al1l200ll) 
(SB) and the lKuiikenl i2006) (KK) implementation of shapelets typ- 
ically fare as well as the methods used in cosmological parameter 
cosmic shear analyses with an accuracy of ~ 4%. 



4.2 Selection Bias 

Selection bias is an issue that is potentially problematic for many 
different types of survey analysis. With weak lensing analyses, 
which relies on the fact that when averaging over many galax- 
ies, the average source galaxy ellipticity (e' s ^) = 0, removing 
even weak selection biases is particularly important. When com- 
piling source catalogues one should therefore consider any forms 
of selection bias that may alter the mean ellipticity of the galaxy 
population. This bias could arise at the source extraction stage if 
there was a preference to select galaxies oriented in the same direc- 
tion as the PSF 1 Kaiser 2000) or galaxies that are anti-correlated 
with the gravitational shear (and as a result appear more circu- 
lar) (Hirata & Seliak 2003). Selection criteria applied after source 
extraction could also bias the mean ellipticity of the population 
if the selection has any dependence on galaxy shape. In this sec- 
tion we determine the level of selection bias by measuring the un- 
weighted mean intrinsic source ellipticity (e' s ') (unlensed, equa- 
tions Q and from the 'real' galaxies selected by each author 
for inclusion in their shear catalogue (false detections are thus ex- 
cised from the catalogue at this stage). We follow a similar analysis 
to section |4~T1 by determining for each author and each PSF type, 
from the range of sheared images, the best-fit parameters to 

/ (s)\ true . s 

(e\ ) sc ic = m sc ic7i + c i 

(4 S) )sclc = Cj. (12) 

(m se ic) averaged over the 6 different PSF data sets gives a measure 
of the shear-dependent selection bias and (cr^) 2 , the variance of 
c\ and C2 as measured from the 6 different PSF models, gives a 
measure of the PSF-anisotropy-dependent selection bias. We find 
that PSF-anisotropy dependent selection bias is very low with a" < 
0.001 for all methods. Shear-dependent selection bias is < 1% in 
most cases with some notable exceptions in the cases of Clowe (CI 
& C2), Schrabback (TS), Dahle (HD) and Nakajima (RN) as shown 
on the vertical axis of Figure|4| The significant variation between 
the different PSF data sets of m se i c measured with the Clowe (C 1 
& C2) catalogues suggests that the selection criteria of this method 
are affected by the PSF type. 

Figure |4] also shows the value of (m uncontaminatcd } deter- 
mined from equation Jilt using the authors' measured shear cat- 
alogues now cleansed of false detections and stellar contamina- 
tion, with author-defined object weights. With unbiased weights 
and an unbiased shear measurement method (where the shear is 
measured accurately but the source selection criteria are poten- 
tially biased), points should fall along the 1:1 line plotted. We can 
therefore conclude from Figure |4| that in many cases the calibra- 
tion bias seen in section |4~TI cannot be solely attributed to selec- 
tion bias. See section [5] for a discussion on sources of selection 
bias. The results plotted in Figure[4]are also tabulated in Table|5] 
Comparing the calibration biases measured from the original cat- 
alogues (m) in Section |4~T1 and from the 'uncontaminated' cata- 
logues (m U ncontaminatcd) shows the impact of false detections and 
stellar contamination in each authors' catalogue. Typically the im- 
pact is low with < 3% changes found for the average measured 
shear of most authors. One noticeable exception is the result from 
the Brown (MB) pipeline, where the underestimation of the shear 
by ~ 7% is found to be predominantly caused by the diluting ~ 7% 
stellar contamination in the object catalogues. 
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Figure 4. Measures of selection bias (m ae i c ), for each author (key in Ta- 
ble|2|, as described in the text. The lower the absolute value of (m sc i c ) the 
lower the level of selection bias. Selection bias can be compared to the 
calibration bias {m. uncontam ; nate< j) measured from catalogues cleansed 
of false detections and stellar contamination. Unbiased shear measurement 
methods, where the shear is measured accurately but the source selection 
criteria are potentially biased, would fall along the 1:1 line over-plotted. 
These results are tabulated in Tablel5l 

4.3 Weight Bias 

In this section we investigate the impact of the different object- 
dependent weighting schemes used by Bridle (SB), Clowe (CI & 
C2), Hetterscheidt (MH), Hoekstra (HH), Kuijken (KK), Schrab- 
back (TS) and Van Waerbeke (VW). All other methods use unit 
weights, except for the methods of Jarvis (MJ) and Nakajima (RN) 
which will be discussed at the end of this section. An optimal 
weighting scheme should reduce the noise on a measurement with- 
out biasing the results. Using the author defined weights we com- 
pare the average unweighted and weighted mean intrinsic galaxy 
ellipticity, performing a similar analysis to sections l4~Tl and l4"2l For 
each author and PSF type we calculate from the range of sheared 
images, the best fitting parameters to 

/ I ( s )\f true . w 

(e\ ')seic - (e\ ) sc ic = ^weight 7i + c i , (13) 

where (e^ ) se lc is an unweighted average and (e^ ) selc is a 
weighted average. In the absence of PSF dependent weight bias, 
c™ should be consistent with zero and we find this to be the case 
for all the weighting schemes tested. In the absence of shear de- 
pendent weight bias, m we i g ht should be consistent with zero. All 
weighting schemes are found to introduce low percent level bias as 
shown in Table [5] where (m W ei g ht) is averaged over the 6 differ- 
ent PSF models. In most cases these biases are small (< 2%) and 
we can therefore conclude the cases of calibration bias seen in sec- 
tion !4. ll cannot be solely attributed to weight bias. For percent level 
precision in future analyses the issue of weight bias will need to be 
considered. 

The Jarvis (MJ) and Nakajima (RN) analyses make use of 
the ellipticity-dependent weighting formulae in BJ02 Section 5. 
This weighting scheme takes advantage of the e = peak in 
the shape distribution of galaxies to improve the signal-to-noise of 
weak shear measurement. This is evidenced by the high signal-to- 



noise results with the Jarvis (MJ) and Nakajima (RN) methods as 
listed in Table [4] Shearing the galaxies does change the assigned 
weights, but the BJ02 formulae explicitly account for this effect via 
a factor called the responsivity. The non-linear response to shear 
seen in the results of the Jarvis (MJ) and Nakajima (RN) methods 
is an undesirable consequence of this weighting scheme which we 
discuss further in section lslrjl 

4.4 Shear measurement dependence on galaxy properties 

The simulations analysed in this paper were sheared uniformly 
across the field-of-view. In reality however, the gravitational shear 
experienced by each galaxy is dependent on position and more 
importantly redshift. High redshift galaxies have a lower apparent 
magnitude and smaller angular size when compared to their lower 
redshift counterparts. It is therefore important that shear measure- 
ment methods are stable to changes in galaxy magnitude and size. 
For each author, we measure the average shear as a function of mag- 
nitude and input disk size. In general, we find that the average shear 
binned as a function of magnitude and disk size varies < 1% to the 
average shear measured from the full data set, and an example plot 
of shear measured as a function of galaxy magnitude is shown from 
the KSB+ implementation of HH in Figure|5| The dot-dashed line 
shows the average 71 — 7i ru ° measured from the full galaxy sample 
which is dominated by the faint magnitude galaxies. For this par- 
ticular analysis the shear measured from bright galaxies is slightly 
underestimated, and the shear from faint galaxies is slightly over- 
estimated. The reader should note however that the shear measured 
from each magnitude bin is < la from the average for all but one 
case and that for weaker input shears, this effect is even less promi- 
nent. 

Investigating the dependence of shear on galaxy properties we 
found that some methods introduced correlations between shear 
and magnitude, whilst others between shear and disk size. Inter- 
estingly however all methods revealed very different dependencies 
on galaxy properties that we were unable to directly parameterise. 
As such we cannot fully address the issue of shear measurement 
dependence on galaxy properties at this time. For percent level pre- 
cision in future analyses this issue will certainly need to be revisited 
and it will be addressed further in future STEP projects using sim- 
ulations with constant shear and constant galaxy magnitude. 



5 DISCUSSION 

In this section we discuss some of the lessons that we have learnt 
from the first STEP initiative and highlight the areas where we can 
improve our methods in future analyses. 

5.1 KSB+ 

The subtle differences between the eight tested KSB+ pipelines, 
detailed in the Appendix, introduces an interesting spread in the 
KSB+ results. Using the information in the Appendix, KSB+ users 
can now modify pipelines to improve their results. The different 
ways of implementing KSB+ and the effect of using different meth- 
ods will be discussed in more detail in a future paper (Hetterscheidt 
et al in prep), but comparing methods and results makes clear which 
interpretations of the KSB+ method are best for ground-based data. 
A good example of this is the PSF correction method of Heymans 
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Author 


(m> 




(?) 


{^uncontaminated ) 


(»™sclc} 


( m weight ) 


<rg analysis ? 


SB 


-0.048 ± 0.027 


0.0018 


- 


-0.017 ±0.030 


0.006 ± 0.004 


0.007 ± 0.002 


X 


MB 


-0.071 ±0.015 


0.0008 


- 


-0.009 ± 0.021 


-0.008 ± 0.002 


- 




CI 


-0.100 ± 0.018 


0.0006 


- 


-0.090 ±0.018 


-0.046 ± 0.022 


0.011 ±0.004 


X 


C2 


-0.084 ± 0.018 


0.0115 


- 


-0.074 ±0.018 


-0.045 ± 0.022 


0.010 ± 0.003 


X 


HD 


0.219 ± 0.036 


0.0005 


-2.40 ±0.27 


0.217 ±0.028 


-0.021 ± 0.006 


- 


X 


MH 


-0.161 ± 0.014 


0.0008 


- 


-0.142 ± 0.015 


-0.017 ±0.001 


0.032 ± 0.003 


X 


CH 


-0.032 ± 0.028 


0.0035 


- 


0.004 ± 0.027 


-0.010 ±0.003 


- 


V 


HH 


-0.015 ± 0.006 


0.0008 


- 


0.018 ±0.004 


-0.001 ±0.001 


0.006 ± 0.001 


V 


MJ 


0.002 ± 0.027 


0.0003 


1.39 ±0.23 


0.011 ± 0.027 


0.005 ± 0.006 




V 


KK 


-0.031 ± 0.023 


0.0017 




-0.029 ± 0.023 


0.006 ± 0.003 


0.020 ± 0.002 


X 


VM 


-0.164 ± 0.028 


0.0014 




-0.116 ± 0.021 


-0.015 ±0.006 




X 


RN 


-0.011 ±0.011 


0.0004 


1.47 ± 0.09 


0.001 ±0.013 


-0.037 ± 0.009 




X 


TS 


-0.167 ± 0.011 


0.0003 




-0.158 ± 0.010 


-0.045 ±0.006 


0.024 ± 0.003 


X 


LV 


-0.068 ± 0.025 


0.0006 




-0.068 ± 0.025 


-0.001 ±0.002 


0.005 ± 0.001 


V 



Table 5. Tabulated measures of calibration bias (in), PSF residuals a c and non-linearity (q) for each author (key in Table l2l. as described in Section |4*T1 
and plotted in Figure l3j. For the non-linear cases where (q) ^ 0, (q) is listed. 'Uncontaminated' calibration bias (m U ncontammated) is measured from 
object catalogues cleansed from stellar contamination and false object detections. This can be compared to the measured selection bias (m BC i c ) as described 
in Section l4~2l and plotted in Figurel4l Weight bias (m. W eight}> described in Section l4~3l is also tabulated. For reference, the final column lists which pipelines 
have been used in cosmic shear analyses that have resulted in measurements of the amplitude of the matter power spectrum, ag,, as detailed in TablefTI 

case with PSF 2), a PSF correction determined only at the stellar 
size produces a less noisy and more successful PSF correction, as 
shown by the success of the PSF correction by other KSB+ users. 
This however would not necessarily be the case for space-based 
data where the PSF ellipticity varies with size (see for example 
lHevmansetaljf2 005) which will be tested in a future STEP anal- 
ysis of simulated space-based observations. The Schrabback (TS) 
method produces a more successful size-dependent PSF correction 
by limiting the image region about stellar objects over which the 
PSF correction parameter p M (r 9 ) is calculated (6„ la x = 3r*, see 
Appendix IA2t . This measure reduces the noise on p M (r s ) thus im- 
proving the overall correction. 

For several methods selection bias is well below the percent 
level from which we can conclude that current source detection 
methods are suitable for weak lensing analyses and that any se- 
lection bias seen with other methods has been introduced after the 
source extraction stage. The first clue to understanding the selec- 
tion bias we see in some cases comes from comparing (m se i c ) for 
the Hetterscheidt (MH) and Schrabback (TS) results in Figure [4] 
These two analyses stem from the same SExtractor catalogue. The 
main differences between these two methods are the technique used 
to correct for the PSF distortion and the catalogue selection criteria 
where Schrabback (TS) places more conservative cuts on galaxy 
size defined by the f lux_radius parameter of SExtractor. Whilst 
there is no correlation within the simulations for intrinsic galaxy 
ellipticity with disk size, we find that the measured hfindpeaks r g 
parameter and the measured SExtractor f lux.radius and FWHM 
parameters are somewhat correlated with galaxy ellipticity. For this 
reason galaxy size selection criteria based on r g , f lux.radius or 
FWHM will introduce a bias. This finding is one of the first lessons 
learnt from this STEP initiative which can now be improved upon 
in future STEP analyses. 



KSB+ analysis by HH 




Iband Magnitude 

Figure 5. An example plot of the difference between measured shear 71 
and input shear -yj ruc as a function of galaxy I band magnitude. This plot 
is taken from the KSB+ analysis of HH using the PSF simulations with 
an input shear <-yJ ruo = 0.05. The dot-dashed line shows the average 71 — 
^,true measure( j f ro m the full galaxy sample. 



(CH) and Clowe (C2) where the correction is calculated as a func- 
tion of galaxy size. For ground-based data where the PSF ellipticity 
is fairly constant at all isophotes (although note that this was not the 
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5.2 K2K 

The Dahle (HD) K2K results appear noisier than other pipelines 
which could result from an upper significance cut in order to re- 
move big, bright galaxies, which in real data are at low redshift un- 
lensed galaxies. This step rejects ~ 24% of the objects. The method 
is optimised for mosaic CCD data with a high number of galaxies 
for each exposure, it therefore suffers somewhat from the low num- 
ber of objects in each 4096 x 4096 STEP image. In addition, as a 
space-saving measure, images were stored in integer format, this 
may have introduced some extra noise in the 're-circularised' im- 
ages. In considering the success of K2K applied to STEP simula- 
tions one should keep in mind that the man-hours invested in testing 
and fine-tuning KSB+ is at least an order of magnitude more than 
for any of the other methods. With the STEP simulations future 
tests and optimisation are now feasible, the results of which will be 
demonstrated with the next STEP analysis of shapelet based image 
simulations. 

5.3 Shapelets 

In the first, blind Kuijken (KK) analysis of the simulations all 
sources were fitted to 8 th order in shapelets, which gives a good 
fit to the PSF-convolved sources. This, however, resulted in a sys- 
tematic underestimate of the shear amplitude of some 10%. Later 
investigation showed that even without any PSF smearing or noise, 
the ellipticity of an exponential disk is only derived correctly if the 
expansion is extended to 12 th order. As this method has, to date, not 
been used in scientific analyses, it was decided that a re-analysis of 
the simulations with 12" 1 order shapelets would be permitted. The 
results of the non-blind re-analysis are shown in this paper. Using 
the higher order shapelet terms removed the systematic underesti- 
mate for the high S/N sources. There is still a tendency for noisy 
sources to have their ellipticities underestimated however and this 
is still under investigation. 

5.4 Im2shape 

Im2shape uses MCMC sampling to fit elliptical Gaussians to the 
image. Before the STEP analysis it was believed that using too few 
iterations in the MCMC analysis would add noise to the ellipticities 
of each galaxy but would not systematically bias them. It became 
apparent during this STEP analysis however, that a bias is in fact 
introduced as the number of iterations is decreased. The number 
of iterations was chosen by systematically increasing the number 
of iterations in the analysis of a subsample of the data until the 
measured average shear converged. 

5.5 Wittman method with ellipto 

A post-STEP analysis of the shape catalogue revealed that the mea- 
sured galaxy shape distribution resulting from this method had 
rather asymmetric tails. The core of the distribution reflected the 
shear much more accurately than did the mean of the entire distri- 
bution. This method could thus be greatly improved by some type 
of weighting or robust averaging scheme. For example, a simple 
iterative 3u clip reduced the 15% underestimate of the strongest 
applied shear, where 7 = 0.1, to an 8% underestimate, while reject- 
ing only 2.2% of the sources. A slightly harsher clip at 2.8<j further 
reduced the underestimate to 3.5%, while still rejecting only 3.9% 
of the sources. The stellar contamination rate of 3.8% is presum- 
ably responsible for the remaining underestimate. Note that the real 



data to which this method has been applied is much deeper than the 
STEP simulations. The stellar contamination rate would therefore 
be much lower, as the galaxy counts rise more steeply with magni- 
tude in comparison to the star counts. 

Of course, one would prefer to understand the origin of the 
asymmetric outliers rather than simply clipping them at the end. A 
brief analysis shows that they are not highly correlated with the ob- 
vious variables such as photometric signal-to-noise or size relative 
to the PSF. Therefore a simple inverse-variance weighting scheme 
would not be enough to solve the problem. The prime task for im- 
proving this method would thus be understanding the cause of this 
asymmetric tail and developing a mitigation scheme. 

5.6 Bernstein & Jarvis Method: BJ02 

The ellipticity-dependent weighting scheme of BJ02 is responsi- 
ble for the significant increase in the signal-to-noise of the STEP 
shear measurements, as shown in Table|4] It has, however, also been 
found to be the cause of the non-linear response of the Jarvis (MJ) 
and Nakajima (RN) methods to shear. After the blind testing phase, 
the results of which are shown in this paper, Jarvis (MJ) re-ran the 
analysis with shape-independent weights finding a linear response 
to the range of weak shears tested such that the non-linearity param- 
eter, q, measured by equation fTTI became consistent with zero. The 
signal-to-noise dropped, however, by a factor of 1.5. We can thus 
recommend that weak shear studies use aggressive weights which 
help to probe small departures of {7) from zero, while studies of 
stronger shear regions use unweighted measurements to minimise 
the effects of non-linearity. 

The false detections in the Nakajima (RN) analysis were in- 
vestigated and found to be either double objects detected by SEx- 
tractor as a single object or diffraction spikes. Double object de- 
tections could be reduced by varying SExtractor parameters to en- 
courage the deblending of overlapping sources. When the data is 
taken in several exposures an additional measure to reduce the num- 
ber of false detections can be introduced. This approach, taken by 
I Jarvis etail 120031) . demands that a source is detected in at least 
two of the four exposures taken of each field. The STEP simula- 
tions were single exposure images and so this procedure could not 
be implemented. These false detections will generally be faint and 
highly elliptical in the case of diffraction spikes. Thus, with the 
weighting scheme implemented in both the Jarvis (MJ) and Naka- 
jima (RN) analyses, these down-weighted objects do not affect the 
overall average measured shear. 



6 CONCLUSION 

In this paper we have presented the results of the first Shear TEst- 
ing Programme, where the accuracy of a wide range of shear mea- 
surement methods were assessed. This paper has demonstrated that, 
for smooth galaxy light profiles, it is currently feasible to m easure 
weak shear at percent level accuracy using the lBernstein & Jarvisl 
(2002) method (BJ02) and the Hoekstra implementation of the 
KSB+ method. It has also shown how important it is to verify shear 
measurement software with image simulations as subtle differences 
between each individuals implementation can result in discrepancy. 
We therefore strongly urge all weak lensing researchers to subject 
their pipelines to a similar analysis to ensure high accuracy and re- 
liability in all future weak lensing studies. To this end the STEP 
simulations will be made available on request. 

The removal of the additive PSF anisotropic distortion has 
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been successful in all methods, reduced to an equivalent shear of 
~ 0.001 in most cases. Significant calibration bias is however seen 
in the results of some methods which can be explained only in 
part by the use of biased weights and/or selection bias. Using the 
simulations analysed in this paper, errors can now be pin-pointed 
and corrected for, and modifications will be introduced to remove 
sources of calibration error. For authors using the KSB+ method, 
detailed descriptions have been given of each pipeline tested in 
this analysis to aid the improvement and development of future 
KSB+ methods. One positive aspect of the KSB+ method is that 
its response to shear has been shown to be very linear. This is con- 
trast to the BJ02 method tested in this paper, where the ellipticity- 
dependent weighting scheme was found to introduce a non-linear 
response to shear. For this reason KSB+ or an unweighted version 
of the BJ02 method is currently the preferred method for measur- 
ing weak shear around nearby galaxy clusters. Cosmic shear, on 
average, is very weak, but with the next generation of cosmic shear 
surveys covering large areas on the sky and thus imaging regions 
of both high and low shear, cosmic shear measurement also re- 
quires a method that is linear in its response to shear. Thus KSB+ 
or an unweighted version of the BJ02 method is currently the pre- 
ferred cosmic shear measurement method. In the weakest regime of 
galaxy-galaxy lensing, the weighted BJ02 method measures shear 
at a higher signal-to-noise with a better accuracy than KSB+ and 
thus appears to be the most promising of the methods that have 
been tested in this analysis for galaxy-galaxy lensing studies. 

Selection bias has been shown to be consistent with zero in 
some cases, from which we can conclude that current source detec- 
tion methods are suitable for weak lensing analyses. Some object 
weighting schemes were found to be unbiased at the below percent 
level. The use of such schemes may however require revision in the 
future when low level biases become important. All the methods 
tested were found to exhibit rather different < 1% dependences on 
galaxy magnitude and size. For real data where shear scales with 
depth and hence magnitude and size, these issues will need to be 
addressed. 

In this paper we have provided a snapshot view of how accu- 
rately we can measure weak shear today from galaxies with rela- 
tively simple galaxy morphologies. We are unable to answer the 
question, what method ought I to use to measure weak lensing 
shear? KSB+, used with care, and BJ02 clearly fare well, but some 
of the methods tested here that are currently still in their devel- 
opment stage may still provide a better method in the future. For 
the cosmic shear, galaxy-galaxy lensing and cluster-mass determi- 
nations published to date, ^ 7% calibration errors are within sta- 
tistical errors and are certainly not dominant. a c < 0.01 is also 
small enough to be sub-dominant in present work. We voice cau- 
tion in explaining the ~ 2a differences in cosmological parameter 
estimation from cosmic shear studies by the scatter in the results 
that we find in this analysis. The true reason is likely to be more 
complex involving source redshift uncertainties, residual systemat- 
ics and sampling variance in addition to the calibration errors we 
have found. Many of these sources of error will be significantly 
reduced with the next generation of surveys where the large ar- 
eas surveyed will minimise sampling variance and the multi-colour 
data will provide a photometric redshift estimate of the source red- 
shift distribution. The now widespread use of diagnostic tools to 
determine levels of non-lensing residual distortions also allows for 
the quantification and reduction in systematic errors. Calibration 
errors, however, can only be directly detected through the analysis 
of image simulations. 

This first STEP analysis has quantified the current levels of 



calibration error, allowing for improvement in calibration accuracy 
in future shear measurement methods. The upcoming next gener- 
ation of wide-field multi-colour optical surveys will reduce statis- 
tical errors on various shear measurements to the ~ 2% level, re- 
quiring calibrations accurate to ~ 1%. In the next decade, deep 
weak-lensing surveys of thousands of square degrees will produce 
shear measurements that will be degraded by calibration accuracies 
> 0.1%, well below even the precision of the current STEP tests. 
Similarly the additive errors represented by a c will ultimately have 
to be reduced to a level of a c <~ 10~ 3 ' 5 if this spurious signal is 
to be below the measurement limits imposed by cosmic variance of 
full-sky surveys. The collective goal of the weak lensing commu- 
nity is now to meet these challenges. 

The next STEP project will analyse a set of ground and space- 
based image simulations that include complex galaxy morpholo- 
gies using a 'shapelet' composition iMassevet ail 12004). Initial 
tests with shapelet simulations suggest that complex morphology 
rather complicates weak shear measurement for methods that as- 
sume Gaussian light profiles. Further STEP projects will address 
the issue of PSF interpolation and modeling, and the impac t of us- 
ing di fferent data reduction and processing techniques lErben et"all 
2005). These future STEP projects will be as important as this first 
STEP analysis in order to gain more understanding and further im- 
prove the accuracy of our methods. We conclude with the hope 
that by using the shared technical knowledge compiled by STEP, 
all future shear measurement methods will be able to reliably and 
accurately measure weak lensing shear. 
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APPENDIX A: KSB+ IMPLEMENTATION 

The KSB+ method, used by a large percentage of the authors, 
has been shown in this STEP analysis to produce remarkably dif- 
ferent results. In this Appendix, to aid the future understanding 
of these differences, we detail how different authors have imple- 
mented KSB+ with their weak lensing pipelines, as summarised in 
TablelAll 



Al Source detection, centroids and size definitions 

Most authors use the SExtractor software iBertin & Arnoutsll996t) 
to detect objects and define galaxy centroids. Exceptions are Hoek- 
stra (HH) and Brown (MB) who use hfindpeaks from the imcat soft- 
ware. The Gaussian weight scale length r g is then either set to the 
f lux.radius SExtractor parameter or the 'optimal' r g value de- 
fined by hfindpeaks. Clowe (C1&2) uses both pieces of software 
using a version of hfindpeaks to determine the optimal weight scal- 
ing r g that keeps the centroid fixed to the SExtractor co-ordinates. 
Hetterscheidt (MH) and Schrabback (TS) measure half light radii 
th and refi ne the SExtractor c entroids using the iterative method 
described in Erbe netaljhoOll) . 



A2 Quadrupole moments and integrals 

The weighted ellipticity e (equation [3}, and the smear and shear 
polarisability tensors P sm and P sh are calculated for each object 
using software developed from the imcat subroutine getshapes. The 
continuous integral formula are calculated from the discrete pix- 
elised data by approximating the integrals as discrete sums. The 
weighted ellipticity e is calculated from the quadrupole moment 
which in its discrete form can be written as follows 

Smax 

Ae 2 w(9 i ,0 j )i(ei,0 j )9 i e j 

Qa = ■ 8max , (Al) 

£ A9 2 W{9 l ,9 J )I{9 l ,9 J ) 

e i ,e_ i =-e max 

where 9 is measured, in pixel units, from the source centroid. Ta- 
ble lAll lists each authors' chosen values for # max and A9. For real 
values of 9, the intensity I(9i, 9j), known at pixel positions, is es- 
timated from a first-order interpolation over the four nearest pixels 
to (9i, 9j) (denoted 'interpolation' in Table lAll . The interpolation 
stage is by-passed by some authors by setting A9 = 1 pixel and 
approximating I(9i,9j) ~ I(Int[#,], lnt[0j]) (denoted 'Approx' 
in Table lAll . or by exchanging the value of 9, in the above formula, 
for its nearest integer value lnt[0] (denoted 'Integer' in Table lAll . 
P sm and P sh are functions of weighted moments, up to fourth or- 
der, that include 9i9j terms. Some authors treat these second order 
terms in 9 differently using the nearest integer values of 9 (denoted 
'Integer' in the P sh and P sm estimate column of TablelAll. 



A3 Anisotropic PSF modeling 

Stellar objects are selected by eye from the stellar locus in a size- 
magnitude plane and are then used to produce a polynomial model 
of the PSF as a function of chip position. Hetterscheidt (MH), Hey- 
mans (CH) and Schrabback (TS) fit directly to p M (equation |5J 
which, in the case of Heymans (CH) and Schrabback (TS), is 
measured for varying r 9 jHoekstraet"aiTl 998). This is in contrast 
to Hetterscheidt (MH) who measures p M with r 3 = r|. Brown 
(MB), Clowe (C1&2), Hoekstra (HH) and Van Waerbeke (LV) cre- 
ate models of £™ bs , p sm * an( j p Bh * separately where for Brown 
(MB), and the first Clowe method (CI) stellar shapes are measured 
with r 3 ~ r*. The second Clowe method (C2), the Hoekstra (HH) 
method and the Van Waerbeke (LV) method measures the stellar pa- 
rameters for varying r 3 . Note that the Van Waerbeke (LV) method 
fits each component of the P sm * and P sh * tensors. With PSF mod- 
els in hand observed galaxy ellipticities are corrected according to 
equation ^4). 

A4 Isotropic P 7 correction 

The application of the anisotropic PSF correction leaves an effec- 
tively isotropic distortion making objects rounder as a result of 
both the PSF and the Gaussian weight function used to measure 
the galaxy shapes. To correct for this effect and convert weighted 
galaxy ellipticities e into unbiased shear estimators 7, we use the 
pre-seeing shear polarisability tensor P 7 , equation 0. P 7 is cal- 
culated for each galaxy from the measured galaxy smear and shear 
polarisability tensors, P sm and P sh , and a term that is dependent 
on stellar smear and shear polarisability tensors; {P sra ~*)~l Pfjs* ■ 
Brown (MB) and the first method of Clowe (CI) use the stellar 
smear and shear polarisability tensors measured with a Gaussian 
weight of scale size r 3 = r* g . Hetterscheidt (MH), Heymans (CH), 
Hoekstra (HH), Schrabback (TS), Van Waerbeke (LV) and the sec- 
ond method of Clowe (C2) calculate this stellar term as a function 
of smoothing scale r fl . Comparing the CI and C2 results therefore 
demonstrates the impact of the inclusion of scale size at this stage. 

P 7 is a very noisy quantity, especially for small galaxies. This 
noise is reduced somewhat by treating P 7 as a scalar equal to half 
its trace (note that the off diagonal terms of P 7 are typically an 
order of magnitude smaller than the diagonal terms). None of the 
methods tes ted in this analysis uses the full P 7 tensor correction 
(see lErben et alj 1200 lh to compare the results achieved when us- 
ing a tensor and scalar P 7 correction). In an effort to reduce the 
noise on P 7 still further, P 7 is often fit as a function of r g , al- 
though note that this fitting process has recently been shown, with 
the Brown (MB) pipeline, to be depend ent on w hich significance 
cuts are made when selecting galaxies iMassev et al]|2005h . Ta- 
ble \Al\ details which method is used by each author. In the case 
of Clowe (C1&2), P 7 is also fit as a function of e, and with the 
method of Van Waerbeke, P 7 is also fit as a function of magnitude. 

In real data Hoekstra (HH) has previously found a clear depen- 
dence of P sh on e. To correct for this shape dependence the Hoek- 
stra pipeline multiplies P sh by (1 — e 2 /2) at the P 7 correction 
stage. This modification is not used in any of the other analyses. 

A5 Weights 

Some authors employ a weighting scheme in their analysis. Hoek- 
stra (HH) and Van Waerbeke (LV) use weights based on the error 
in the ellipti city measurement. The se weights are derived in Ap- 
pendix Al of Ho ekstra et alj <200cl) . Clowe (C1&2), Hetterscheidt 
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KSB Author 


Brown 


Clowe 


Clowe 


Hetterscheidt 


Key 


MB 


CI 


C2 


MH 


Source Detection 


hfindpeaks 


hfind + SExt 


hfind + SExt 


SExtractor 


PSF: 

2D polynomial 
model 


2 nd order 
to e*and 

psm* psh* 


3 rd order 
toe*,P sm *,P sh * 


3 rd order 

to e* , P sm * , P sh * 
fP sh /P sm )(V„) 


3 rd order 
top"(r*) 
3.5cr clipping 


Galaxy size r 9 


from hfindpeaks 


from hfindpeaks 


from hfindpeaks 


f lux_radius 


Quadrupole estimate 

#max and A6 

P sh and P sm estimate 


Approx 
Int[4r 9 ], 1 pix 
Approx 


Approx 
Int[3r 9 ], 1 pix 
Approx 


Approx 
Int[3r 9 ], 1 pix 
Approx 


Interpolation 
3r 9 , 0.25 pix 
Interpolation 


Pi correction 


Fit of 

±Tr[P^](r 9 ) 


(P sh /P sm )(r*) 


FitP^(r 9 ,e) 
(P sh /P sm )(r 9 ) 


|Tr[PT] 
(no fit) 


Weights 


none 


< 7 2 > _1 (r g ,u) 


< 7 2 >~ X (r S ,") 


< 7 2 > _1 ( r g: mag) 


7 correction 


Calibration 

7cor = 7/0.85 


Close-pair 

7cor = 7/0.95 


Close-pair 

7cor = 7/0.95 




Ellipticity cut 
Size cut 

Significance cut 
Pi cut 
7 cut 
Other 


|e f>sK 0.5 
r a >r* g 
v > 5 


r* < r g < 6 pix 
v > 10 
P 1 ^ 0.15 

\d\ < lpix 
SEx class <0.8 
No sat/bad pix 


r* < r g < 6 pix 
i/ > 10 
P 7 ^ 0.15 

|d| < lpix 
SEx class <0.8 
No sat/bad pix 


\£obs\ SS0.8 

±Tr[P^] > 
\d\ < 3pix 


KSB Author 


Heymans 


Hoekstra 


Schrabback 


Van Waerbeke 


Key 


CH 


HH 


TS 


LV 


Source Detection 


SExtractor 


hfindpeaks 


SExtractor 


SExtractor 


PSF: 

2D polynomial 
model 


2 nd order 
to p M (V 9 ) and 
(P«»)-lf»h«(r 9 ) 


2 nd order 

toe*(r 9 ), 
P sm *(r 9 ) and P sh *(V 9 ) 


3 rd order 
top^(r 9 ) 


2 nd order 

toe*(r 9 ) 
P sm *(r 9 ) and P sh *(r 9 ) 


Galaxy size r 9 


f lux_radius 


from hfindpeaks 


f lux_radius 


f lux_radius 


Quadrupole estimate 

Omax and A0 

P sh and P sm estimate 


Approx 
Int[4r 9 ], 1 pix 
Integer 


Interpolation 
Interpolation 


Interpolation 
3r 9 , 0.25 pix 
Interpolation 


Approx 
Int[4r 9 ], 1 pix 
Approx 


Pi correction 


iTr[PT] 
(no fit) 


psh _> (1 _ £ 2 /2)P sh 

Fit to 

±Tr[P^](r 9 ) 


iTr[PT] 
(no fit) 


Fit in (r 9 , mag) 
to ±Tr[P7] 


Weights 


none 


Hoekstra et al. 
eqn A8,9 


< 7 2 > _1 (fg, mag) 


Hoekstra et al. 
eqn A8,9 


7 correction 


Ellipticity cut 
Size cut 

Significance cut 
Pi cut 
7 cut 
Other 


\e bs\ 0.5 
1.2r* <r s < 7pix 
v > 10 

| 7 |<2 
Close pairs 


selection 

v > 5 


|£ CO r| «S 0.8 
r h > 1.2r* 

iTr[PT] > 

\d\ < 3 pix 


v > 15 



< lOpix 



removed 



Table Al. The stages implemented by different authors using the KSB+ method described in section l2*71 Table notation; pix = pixel units; P(r 9 ) implies that 
parameter P is measured as a function of scale size r g ; P(r*~) implies that parameter P is measured at the stellar scale size r*. See the Appendix text for 
more details. 
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(MH) and Schrabback (TS) use a weighting scheme based on the 
inverse of (j 2 ) for all galaxies within a given amount of r g and 
magnitude (TS,MH) or significance v (C1&2) of the galaxy using 
a minimum of 20/50 (TS,MH/C1&2) galaxies. Note that this type 
of weighting applied to galaxies that have experienced a constant 
shear will introduce a stronger bias that when the same weights are 
applied to data where the shear varies. 

A6 Selection criteria and calibration correction 

After applying the KSB+ method to the data each author has in- 
cluded a set of selection criteria, listed in Table IaTI These crite- 
ria are based on object significance v, 'optimal' size r g , half light 
radius ru, observed ellipticity e obs , corrected ellipticity £ cor , mea- 
sured shear 7, SExtractor stellar class (1 = star, = galaxy), mea- 
sured/modeled P 1 and so on. The imcat software getshapes deter- 
mines the offset of the flux averaged galaxy centroid (first moment) 
from the given input galaxy centroid, scaled by the galaxy flux. This 
measure, d, is used by Clowe (C1&2) to select 'good' galaxies. 
A similar selection criterion is included in the methods of Hetter- 
scheidt (MH) and Schrabback (TS), where objects are only selected 
if their iterative refinement of the centroid position converges and 
fixes the position to better than 2 x 10 -3 pixels independently in x 
and y. imcat also flags up saturated and bad pixels which add noise 
to the quadrupole moments. Clowe (C1&2) removes galaxies with 
any saturated or bad pixels within 3r 9 of the centroid. 

Brown (MB) includes a calibration correction 7 c or = 7/0.85 
as sug gested from the analysis of image simulations in Bac on et alJ 
(2001). Clowe (CI & C2) includes a close-pair calibration correc- 
tion 7 cor = 7/0.95 to account for the diluting effect of blended 
objects. Normally Clowe visually inspects data to remove double 
objects classified as a single source and sources with tidal tails in 
addition to optical defects such as stellar spikes and satellite trails. 
This is feasible with the typical amounts of data analysed in clus- 
ter lensing analyses. For wide-field cosmic shear surveys however 
visual inspection becomes rather time consuming. For this analy- 
sis Clowe therefore visually inspected 10 images from the simula- 
tion resulting in the rejection ~ 5% of the objects. This process 
was found to increase the average shear measured in the visually 
inspected images by ~ 5%. Thus Clowe includes a close-pair cor- 
rection factor in the STEP analysis to account for this effect in the 
whole simulation set. 



